20 research outputs found

    PiSDF: Parameterized & Interfaced Synchronous Dataflow for MPSoCs Runtime Reconfiguration

    Get PDF
    International audienceDataflow models of computation are widely used for the specification, analysis, and optimization of Digital Signal Processing (DSP) applications. In this talk, we present the Parameterized and Interfaced Synchronous Dataflow (πSDF) model that addresses the important challenge of managing dynamics in DSP-oriented representations. In addition to cap-turing application parallelism, which is an intrinsic feature of dataflow models, πSDF enables the specification of hierarchical and reconfigurable applications. The Synchronous Parameterized and Interfaced Dataflow Embedded Runtime (SPIDER) is also presented to support the execution of πSDF specifications on heterogeneous Multiprocessor Systems-on-Chips (MPSoCs)

    PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming

    Get PDF
    International audienceThe high performance Digital Signal Processors (DSP) currently manufactured by Texas Instruments are heterogeneous multiprocessor architectures. Programming these architectures is a complex task often reserved to specialized engineers because the bottlenecks of both the algorithm and the architecture need to be deeply understood in order to obtain a fairly parallel execution. The PREESM framework objective is to simplify the programming of multicore DSP systems by building on dataflow programming methods. The current functionalities of this scalable framework cover memory and time analysis, as well as automatic deadlock-free code generation. Several tutorials are provided with the tool for fast initiation of C programmers to multicore DSP programming. This paper demonstrates PREESM capabilities by comparing simulation and execution performances on a stereo matching algorithm prototyped on the TMS320C6678 8-core DSP device

    Models of Architecture: Reproducible Efficiency Evaluation for Signal Processing Systems

    Get PDF
    International audienceThe current trend in high performance and embedded signal processing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In this paper, we define the notion of Model of Architecture (MoA) and study the combination of a Model of Computation (MoC) and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. A cost is computed from the mapping of an application, represented by a model conforming a MoC onto an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

    Models of Architecture: Reproducible Efficiency Evaluation for Signal Processing Systems

    Get PDF
    International audienceThe current trend in high performance and embedded signal processing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In this paper, we define the notion of Model of Architecture (MoA) and study the combination of a Model of Computation (MoC) and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. A cost is computed from the mapping of an application, represented by a model conforming a MoC onto an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

    Models of Architecture

    Get PDF
    The current trend in high performance and embedded computing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In state of the art Model Driven Engineering (MDE) methods, different communities have developed custom architecture models associated to languages of substantial complexity. This fact contrasts with Models of Computation (MoCs) that provide abstract representations of an algorithm behavior as well as tool interoperability.In this report, we define the notion of Model of Architecture (MoA) and study the combination of a MoC and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. An MoA provides reproducible cost computation for evaluating the efficiency of a system. A new MoA called Linear System-Level Architecture Model (LSLA) is introduced and compared to state of the art models. LSLA aims at representing hardware efficiency with a linear model. The computed cost results from the mapping of an application, represented by a model conforming a MoC on an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

    Techniques d'ordonnancement en ligne pour la répartition d'applications flot de données de traitement de signal et de l'image sur architectures multi-cœur hétérogène embarqué

    No full text
    An important trend in embedded processing is the integration of increasingly more processing elements into Multiprocessor Systemson- Chip (MPSoC). This trend is due in part to limitations in processing power of individual elements that are caused by power consumption considerations. At the same time, signal processing applications are becoming increasingly dynamic in terms of their hardware resource requirements due to the growing sophistication of algorithms to reach higher levels of performance. In design and implementation of multicore signal processing systems, one of the main challenges is to dispatch computational tasks efficiently onto the available processing elements while taking into account dynamic changes in application functionality and resource requirements. An inefficient use can lead to longer processing times and higher energy consumption, making multicore task scheduling a very difficult problem to solve. Dataflow process network Models of Computation (MoCs) are widely used in design of signal processing systems. It decomposes application functionality into actors that communicate data exclusively through channels. The interconnection of actors and communication channels is modeled and manipulated as a directed graph, called a dataflow graph. There are different dataflow MoCs which offer different trade-off between predictability and expressiveness. These MoCs are widely used in design of signal processing systems due to their analyzability and their natural parallel expressivity. In this thesis, we propose a novel scheduling method to address multicore scheduling challenge. This scheduling method determines scheduling decisions strategically at runtime to optimize the overall execution time of applications onto heterogeneous multicore processing resources. Applications are described using the Parameterized and Interfaced Synchronous DataFlow (PiSDF) MoC. The PiSDF model allows describing parameterized application, making possible changes in application’s resource requirement at runtime. At each execution, the parameterized dataflow is then transformed into a locally static one used to efficiently schedule the application with an a priori knowledge of its behavior. The proposed scheduling method have been tested and benchmarked on multiple state-of-the-art applications from computer vision, signal processing and multimedia domains.Une tendance importante dans le domaine de l’embarqué est l’intégration de plus en plus d’éléments de calcul dans les systèmes multiprocesseurs sur puce (MPSoC). Cette tendance est due en partie aux limitations des puissances individuelles de ces éléments causées par des considérations de consommation d’énergie. Dans le même temps, en raison de leur sophistication croissante, les applications de traitement du signal ont des besoins en puissance de calcul de plus en plus dynamique. Dans la conception et le développement d’applications de traitement de signal multicoeur, l’un des principaux défis consiste à répartir efficacement les différentes tâches sur les éléments de calcul disponibles, tout en tenant compte des changements dynamiques des fonctionnalités de l’application et des ressources disponibles. Une utilisation inefficace peut conduire à une durée de traitement plus longue et/ou une consommation d’énergie plus élevée, ce qui fait de la répartition des tâches sur un système multicoeur une tâche difficile à résoudre. Les modèles de calcul (MoC) flux de données sont communément utilisés dans la conception de systèmes de traitement du signal. Ils décomposent la fonctionnalité de l’application en acteurs qui communiquent exclusivement par l’intermédiaire de canaux. L’interconnexion des acteurs et des canaux de communication est modélisée et manipulée comme un graphe orienté, appelé un graphique de flux de données. Il existe différents MoCs de flux de données qui offrent différents compromis entre la prédictibilité et l’expressivité. Ces modèles de calculs sont communément utilisés dans la conception de systèmes de traitement du signal en raison de leur analysabilité et leur expressivité naturelle du parallélisme de l’application. Dans cette thèse, une nouvelle méthode de répartition de tâches est proposée afin de répondre au défi que propose la programmation multicoeur. Cette méthode de répartition de tâches prend ses décisions en temps réel afin d’optimiser le temps d’exécution global de l’application. Les applications sont décrites en utilisant le modèle paramétrée et interfacé flux de données (PiSDF). Ce modèle permet de décrire une application paramétrée en autorisant des changements dans ses besoins en ressources de calcul lors de l’exécution. A chaque exécution, le modèle de flux de données paramétré est déroulé en un modèle intermédiaire faisant apparaitre toute les tâches de l’application ainsi que leurs dépendances. Ce modèle est ensuite utilisé pour répartir efficacement les tâches de l’application. La méthode proposé a été testée et validé sur plusieurs applications des domaines de la vision par ordinateur, du traitement du signal et du multimédia

    SPIDER: A Synchronous Parameterized and Interfaced Dataflow-Based RTOS for Multicore DSPs

    Get PDF
    International audienceThis paper introduces a novel Real-Time Operating System (RTOS) based on a parameterized dataflow Model of Computation (MoC). This RTOS, called Synchronous Parameterized and Interfaced Dataflow Embedded Runtime (SPiDER), aims at efficiently scheduling Parameterized and Interfaced Synchronous Dataflow (PiSDF) graphs on multicore architectures. It exploits features of PiSDF to locate locally static regions that exhibit predictable application behavior. This paper uses a multicore signal processing benchmark to demonstrate that the SPiDER runtime can exploit more parallelism than a conventional multicore task scheduler. By comparing experimental results of the SPiDER runtime on an 8-core Texas Instruments Keystone I Digital Signal Processor (DSP) with those obtained from the OpenMP framework, latency improvements of up to 26% are demonstrated

    Applying the Adaptive Hybrid Flow-Shop Scheduling Method to Schedule a 3GPP LTE Physical Layer Algorithm onto Many-Core Digital Signal Processors

    Get PDF
    International audienceCurrently, Multicore Digital Signal Processor (DSP) platforms are commonly used in telecommunications baseband processing. In the next few years, high performance DSPs are likely to combine many more DSP cores for signal processing with some General-Purpose Processor (GPP) cores for application control. As the number of cores increases in new DSP platform designs, scheduling of applications is becoming a complex operation. Meanwhile, the variability of the scheduled applications also tends to increase as applications become more sophisticated. Such variations require runtime adaptivity of application scheduling. This paper extends the previous work on adaptive scheduling by using the Hybrid Flow-Shop (HFS) scheduling method, which enables the device architecture to be modeled as a pipeline of Processing Elements (PEs) with multiple alternate PEs for each pipeline stage. HFS scheduling is applied to the scheduling of 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) telecommunication standard Uplink Physical Layer data processing (PUSCH). The experiments, conducted on an ARM Cortex-A9 GPP, show that an HFS scheduling algorithm has an overhead that increases very slowly with the number of PEs. This makes the method suitable for executing the adaptive scheduling in less than 1 ms for the 501 actors of a LTE PUSCH dataflow description executed on a 256-core architecture

    Software HEVC video decoder: towards an energy saving for mobile applications

    No full text
    International audienc

    Delays and States in Dataflow Models of Computation

    Get PDF
    International audienceDataflow Models of Computation (MoCs) have proven efficient means for modeling computational aspects of Cyber-Physical System (CPS). Over the years, diverse MoCs have been proposed that offer trade-offs between expressivity, conciseness, predictability, and reconfigurability. While being efficient for modeling coarse grain data and task parallelism, state-of-the-art dataflow MoCs suffer from a lack of semantics to benefit from the lower grained parallelism offered by hierarchically modeled nested loops. In this paper 1 , a meta-model called State-Aware Dataflow (SAD) is proposed that enhances a dataflow MoC, introducing new semantics to take advantage of such nested loop parallelism. SAD extends the semantics of the targeted MoC with unambiguous data persistence scope. The extended expressiveness and conciseness brought by the SAD meta-model are demonstrated with a reinforcement learning use-case
    corecore